Goto

Collaborating Authors

 infeasible case


Testing the Feasibility of Linear Programs with Bandit Feedback

arXiv.org Machine Learning

While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. Concretely, we test if $\exists x: Ax \ge 0$ for an unknown $A \in \mathbb{R}^{m \times d}$, by playing a sequence of actions $x_t\in \mathbb{R}^d$, and observing $Ax_t + \mathrm{noise}$ in response. By identifying the hypothesis as determining the sign of the value of a minimax game, we construct a novel test based on low-regret algorithms and a nonasymptotic law of iterated logarithms. We prove that this test is reliable, and adapts to the `signal level,' $\Gamma,$ of any instance, with mean sample costs scaling as $\widetilde{O}(d^2/\Gamma^2)$. We complement this by a minimax lower bound of $\Omega(d/\Gamma^2)$ for sample costs of reliable tests, dominating prior asymptotic lower bounds by capturing the dependence on $d$, and thus elucidating a basic insight missing in the extant literature on such problems.


An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem

arXiv.org Machine Learning

Multi-armed bandit (MAB) problem has been increasingly recognized as a fundamental and powerful framework in decision theory, where an agent's objective is to make a series of decisions to pull an arm of a K slot machine in order to maximize the total reward. Each of the arms is associated with a fixed but unknown probability distribution [5, 31]. An enormous literature has accumulated over the past decades on the MAB problem, such as clinical trials and drug testing [6, 19], recommendation system and online advertising [7, 9, 42, 51, 56], information retrieval [8, 37], and finance [24, 39, 40, 48]. From a theoretical perspective, the MAB problem was first studied in the seminal work [44] and followed by a vast line of work to study in regret minimization [2, 4, 5, 10, 14, 18, 32, 34, 49, 53] and pure exploration [11, 22, 36, 46]. In this paper, we investigate the convex hull membership (CHM) problem in a pure exploration setting, where a learner sequentially performs experiments in a stochastic multi-armed bandit environment to identify if a given point lies in the convex hull of means of K arms as efficiently and accurately as possible. In particular, we are interested in the minimum expected number of samples required to identify the convex hull membership with high probability (at least 1 δ), without considering a reward/cost structure. The non-stochastic version of the convex hull membership problem is well studied in [20] and has attracted significant attention in different scientific areas and proven its crucial applications in image processing [25, 55], robot motion planning [33, 50] and pattern recognition [27, 45].